Imputing missing covariate values for the Cox model
نویسندگان
چکیده
Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear.We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H(0)(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H(0)(T), we approximate it by the Nelson-Aalen estimator of H(T) or estimate it by Cox regression.We compare the methods using simulation studies. We find that using logT biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson-Aalen estimator of H(T) in the imputation model.
منابع مشابه
A Simulation Study Comparing Two Methods of Handling Missing Covariate Values when Fitting a Cox Proportional- Hazards Regression Model
Missing covariate values is a common problem in a survival data research. The aim of this study is to compare the use of the multiple imputation (MI) and last observation carried forward (LOCF) methods for handling missing covariate values in the Cox proportional hazards (PH) regression model. The comparisons between the methods are based on simulated data. The missingness mechanism is assumed ...
متن کاملتحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند
Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...
متن کاملMissing covariates in competing risks analysis
Studies often follow individuals until they fail from one of a number of competing failure types. One approach to analyzing such competing risks data involves modeling the cause-specific hazards as functions of baseline covariates. A common issue that arises in this context is missing values in covariates. In this setting, we first establish conditions under which complete case analysis (CCA) i...
متن کاملMultiple imputation of covariates by substantive model compatible fully conditional specification
Multiple imputation (MI) is a practical, principled approach to handling missing data. When used to impute missing values in covariates of regression models, imputation models may be mis-specified if they are not compatible with the substantive model of interest for the outcome. In this article we introduce the smcfcs command, which imputes covariates by substantive model compatible fully condi...
متن کاملAsymptotic Theory for the Cox Model with Missing Time-dependent Covariate
The relationship between a time-dependent covariate and survival times is usually evaluated via the Cox model. Time-dependent covariates are generally available as longitudinal data collected regularly during the course of the study. A frequent problem, however, is the occurence of missing covariate data. A recent approach to estimation in the Cox model in this case jointly models survival and ...
متن کامل